Search CORE

82 research outputs found

Revisiting the tree edit distance and its backtracing: A tutorial

Author: Paaßen Benjamin
Publication venue
Publication date: 13/04/2021
Field of study

Almost 30 years ago, Zhang and Shasha (1989) published a seminal paper describing an efficient dynamic programming algorithm computing the tree edit distance, that is, the minimum number of node deletions, insertions, and replacements that are necessary to transform one tree into another. Since then, the tree edit distance has been widely applied, for example in biology and intelligent tutoring systems. However, the original paper of Zhang and Shasha can be challenging to read for newcomers and it does not describe how to efficiently infer the optimal edit script. In this contribution, we provide a comprehensive tutorial to the tree edit distance algorithm of Zhang and Shasha. We further prove metric properties of the tree edit distance, and describe efficient algorithms to infer the cheapest edit script, as well as a summary of all cheapest edit scripts between two trees.Comment: Supplementary material for the ICML 2018 paper: Tree Edit Distance Learning via Adaptive Symbol Embedding

arXiv.org e-Print Archive

Tree Edit Distance Learning via Adaptive Symbol Embeddings

Author: Gallicchio Claudio
Hammer Barbara
Micheli Alessio
Paaßen Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which we call embedding edit distance learning (BEDL) and which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018), 2018-07-10 to 2018-07-15 in Stockholm, Swede

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Adaptive Affine Sequence Alignment Using Algebraic Dynamic Programming

Author: Paaßen Benjamin
Publication venue: Bielefeld University
Publication date: 01/01/2015
Field of study

Paaßen B. Adaptive Affine Sequence Alignment Using Algebraic Dynamic Programming. Bielefeld: Bielefeld University; 2015.A core issue in machine learning is the classification of data. However, for data structures that can not easily be summarized in a feature representation, standard vectorial approaches are not suitable. An alternative approach is to represent the data not by features, but by their similarities or disimilarities to each other. In the case of sequential data, dissimilarities can be efficiently calculated by well-established alignment distances. Recently, techniques have been put forward to adapt the parameters of such alignment distances to the specific data set at hand, e.g. using gradient descent on a cost function. In this thesis we provide a comprehensive theory for gradient descent on alignment distance based on Algebraic Dynamic Programming, enabling us to adapt even sophisticated alignment distances. We focus on Affine Sequence Alignment, which we optimize by gradient descent on the Large Margin Nearest Neighbor cost function. Thereby we directly optimize the classification accuracy of the popular k-Nearest Neighbor classifier. We present a free software implementation of this theory, the TCS Alignment Toolbox, which we use for the subsequent experiments. Our experiments entail alignment distance learning on three diverse data sets (two artificial ones and one real-world example), yielding not only an increase in classification accuracy but also interpretable resulting parameter settings

Publications at Bielefeld University

Metric Learning for Structured Data

Author: Paaßen Benjamin
Publication venue: Universität Bielefeld
Publication date: 01/01/2019
Field of study

Paaßen B. Metric Learning for Structured Data. Bielefeld: Universität Bielefeld; 2019.Distance measures form a backbone of machine learning and information retrieval in many application fields such as computer vision, natural language processing, and biology. However, general-purpose distances may fail to capture semantic particularities of a domain, leading to wrong inferences downstream. Motivated by such failures, the field of metric learning has emerged. Metric learning is concerned with learning a distance measure from data which pulls semantically similar data closer together and pushes semantically dissimilar data further apart. Over the past decades, metric learning approaches have yielded state-of-the-art results in many applications. Unfortunately, these successes are mostly limited to vectorial data, while metric learning for structured data remains a challenge. In this thesis, I present a metric learning scheme for a broad class of sequence edit distances which is compatible with any differentiable cost function, and a scalable, interpretable, and effective tree edit distance learning scheme, thus pushing the boundaries of metric learning for structured data. Furthermore, I make learned distances more useful by providing a novel algorithm to perform time series prediction solely based on distances, a novel algorithm to infer a structured datum from edit distances, and a novel algorithm to transfer a learned distance to a new domain using only little data and computation time. Finally, I apply these novel algorithms to two challenging application domains. First, I support students in intelligent tutoring systems. If a student gets stuck before completing a learning task, I predict how capable students would proceed in their situation and guide the student in that direction via edit hints. Second, I use transfer learning to counteract disturbances for bionic hand prostheses to make these prostheses more robust in patients' everyday lives

Publications at Bielefeld University

Lecture Notes on Applied Optimization

Author: Artelt André
Hammer Barbara
Paaßen Benjamin
Publication venue
Publication date: 01/01/2019
Field of study

Paaßen B, Artelt A, Hammer B. Lecture Notes on Applied Optimization. Faculty of Technology, Bielefeld University; 2019.These lecture notes cover theory and algorithms for optimization from an application perspective. With respect to theory we cover basic definitions of optimization problems and their solutions, necessary and sufficient conditions of optimality, convex problems and optimality under convexity, Lagrange- and Wolfe dual forms, as well as Karush-Kuhn-Tucker conditions of optimality. With respect to algorithms we cover analytical optimization; numeric optimization, especially (conjugate) gradient descent, (pseudo-)Newton, trust region, log-barrier, penalty, and projection methods; probabilistic optimization, especially expectation maximization and max-product; linear and quadratic programming; and heuristics, especially the Nelder-Mead algorithm, CMA-ES, Bayesian optimization, hill climbing, simulated annealing, tabu search, branch-and-cut, and ant colony optimization. As such, this document provides a comprehensive overview of the most important optimization techniques for a wide range of application domains as well as their theoretical foundations

Publications at Bielefeld University

Reservoir Memory Machines as Neural Computers

Author: Hammer Barbara
Paaßen Benjamin
Schulz Alexander
Stewart Terrence C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2021
Field of study

Differentiable neural computers extend artificial neural networks with an explicit memory without interference, thus enabling the model to perform classic computation tasks such as graph traversal. However, such models are difficult to train, requiring long training times and large datasets. In this work, we achieve some of the computational capabilities of differentiable neural computers with a model that can be trained very efficiently, namely an echo state network with an explicit memory without interference. This extension enables echo state networks to recognize all regular languages, including those that contractive echo state networks provably can not recognize. Further, we demonstrate experimentally that our model performs comparably to its fully-trained deep version on several typical benchmark tasks for differentiable neural computers.Comment: In print at the special issue 'New Frontiers in Extremely Efficient Reservoir Computing' of IEEE TNNL

arXiv.org e-Print Archive

Adaptive structure metrics for automated feedback provision in intelligent tutoring systems

Author: Hammer Barbara
Mokbel Bassam
Paaßen Benjamin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Paaßen B, Mokbel B, Hammer B. Adaptive structure metrics for automated feedback provision in intelligent tutoring systems. Neurocomputing. 2016;192(SI):3-13.Typical intelligent tutoring systems rely on detailed domain-knowledge which is hard to obtain and difficult to encode. As a data-driven alternative to explicit domain-knowledge, one can present learners with feedback based on similar existing solutions from a set of stored examples. At the heart of such a data-driven approach is the notion of similarity. We present a general-purpose framework to construct structure metrics on sequential data and to adapt those metrics using machine learning techniques. We demonstrate that metric adaptation improves the classification of wrong versus correct learner attempts in a simulated data set from sports training, and the classification of the underlying learner strategy in a real Java programming dataset

Publications at Bielefeld University

Two or three things we do (not) know about distances

Author: Paaßen Benjamin
Schleif Frank-Michael
Villmann Thomas
Publication venue
Publication date: 01/01/2017
Field of study

Paaßen B. Two or three things we do (not) know about distances. In: Schleif F-M, Villmann T, eds. Proceedings of the Ninth Mittweida Workshop on Computational Intelligence (MiWoCI 2017). Machine Learning Reports. 2017: 32-33

Publications at Bielefeld University